Skip to content

RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677

Open
potiuk wants to merge 7 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-10
Open

RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677
potiuk wants to merge 7 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-10

Conversation

@potiuk

@potiuk potiuk commented Jun 10, 2026

Copy link
Copy Markdown
Member

What

Adds a threat model for Apache Creadur (RAT) at the Creadur PMC's request (GLASSWING / Mythos scan pre-flight):

  • THREAT_MODEL.md — the model (rubric).
  • SECURITY.md + AGENTS.md — disclosure pointer + the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md chain.

The model in brief

RAT is modelled as an in-process build/CLI license-audit tool — not a network service, and explicitly not a security/vulnerability scanner. Its security-relevant case is auditing untrusted input: the XML configuration (XXE surface) and archive descent (decompression-bomb surface). Findings that require RAT to process input the operator already trusts (the normal case — your own source tree) are out of model.

DRAFT — you own it; two quick technical confirmations

Because RAT is small, the §8-vs-§9 split hinges on two facts I've left as section 14 questions:

  • Q3 — does XMLConfigurationReader disable DOCTYPE/external entities (XXE-safe)?
  • Q4 — does ArchiveWalker bound decompression (size/depth/entry-count)?

Your answers turn those from "open question" into either a provided property (§8) or a documented gap + downstream note (§9). Also Q6: want me to add the same chain to creadur-whisker and creadur-tentacles so all three are discoverable?

Generated by the ASF Security team's threat-model tooling (Claude Opus); reviewed before opening.

@ottlinger ottlinger changed the title Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md) RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md) Jun 11, 2026
Rebased onto current master, which already added AGENTS.md and SECURITY.md. Keeps both maintainer files and adds the detailed THREAT_MODEL.md plus the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointers.

Generated-by: Claude Opus 4.8 (1M context)
@potiuk potiuk force-pushed the asf-security/threat-model-2026-06-10 branch from 35879b0 to d4f0fdd Compare June 14, 2026 01:19
@Claudenw

Copy link
Copy Markdown
Contributor

This PR looks like it needs answers from developers before submitting.
@potiuk is that the expected flow? We answer questions and push the resulting file(s) to the repo?

@potiuk

potiuk commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

Yes. Absolutely. - it's enough we just comment in the PR answering the questions and I will update the PR accordingly

Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md Outdated
@ottlinger ottlinger requested a review from Claudenw June 16, 2026 12:40
@Claudenw

Copy link
Copy Markdown
Contributor

@potiuk

In answer to the first question, there is a PR to ensure we have this covered.
In answerto the second question, we utilize Apache Common Compress ArchiveStreamFactory class to create archive streams. If the filters indicate that we should process the archived file we extract the contents into an in-memory buffer where it is held until the document is processed. So we could hit OOM.

Incorporates the Creadur PMC's PR apache#677 review:
- archive walker confirmed unbounded (in-memory extraction) -> §9 gap + §10
- XML/DOCTYPE hardening noted as in-flight PMC PR (§14 Q3, link pending)
- documents RAT write mode (--addLicense) as trusted-input / out-of-model
- notes CLI/Ant/Maven front-ends are generated from a common core
- §15 corrected: SECURITY.md already exists (added via apache#671)

Generated-by: Claude Opus 4.8
@potiuk

potiuk commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Thanks Claudenw and ottlinger — folded your review into the draft and pushed (THREAT_MODEL.md, +53/-21):

  • Archive handling (your Q on OOM): confirmed as a gap. The unbounded in-memory extraction via Commons Compress ArchiveStreamFactory is now §9 (not a provided property), with the matching §10 downstream responsibility (sandbox / resource-limit RAT runs over untrusted archives) and a VALID-HARDENING disposition. Marked maintainer-confirmed.

  • XML/DOCTYPE (your "there is a PR to ensure we have this covered"): I read that as the XXE-hardening question (§14 Q3). I've noted a hardening PR is in flight but left §8 Gradle RAT: Add plainOutput, xmlOutput and htmlOutput options #2 tentative pending the link — could you drop the PR number here so I can cite it? Once it lands I'll flip XXE to a provided property.

  • ottlinger — write mode: good catch. RAT's --addLicense / editor mode writes headers into the audited tree; I've documented it explicitly in §2/§3 as operator-invoked on the operator's own (trusted) sources → it's OUT-OF-MODEL: trusted-input, but now stated rather than silent.

  • ottlinger — generated front-ends: added a §2 note that CLI/Ant/Maven are generated from a common option core, so a security property (or gap) in the core transfers to all three UIs.

  • ottlinger — SECURITY.md / RAT-558: Add proposals for AGENTS and SECURITY configuration - to be discussed #671: thanks, fixed the stale §15 line — it now reflects that RAT-558: Add proposals for AGENTS and SECURITY configuration - to be discussed #671 added SECURITY.md and this PR just appends the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointer.

Still open if you have a moment (one line each is plenty): Q1 (confirm the untrusted-input case is the one to model), Q2 (RAT makes no network connections), Q5 (Whisker/Tentacles share the profile), and Q6 (want us to add the same pointer files to creadur-whisker/-tentacles, or will you?).

One note on CI: the failing "Build and analyze" (CodeQL) check is unrelated to this PR — it's a docs-only change (three .md files), so it isn't introducing or affected by that build job; looks pre-existing/flaky on the branch.

@Claudenw

Claudenw commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@potiuk There is one more point that has not been discussed. RAT allows developers to extend the matching algorithms. See https://creadur.apache.org/rat/license_def.html#Matchers

The upshot is that 3rd parties can create new matchers and use them in license checks. Matchers are different from license checks in that license checks use matchers. For example the Apache 2.0 license check uses the text, spdx and any matchers.

Matchers scan the contents of the file (as a String) looking for matches. This means that a custom matcher would have access to all text from all files that are selected for scanning.

But this is defined in the configuration and is under control of the developer using RAT.

@ottlinger

Copy link
Copy Markdown
Contributor

@potiuk the Sonarbuild does only run on specific branches/with specific PRs as the credentials are not shared among all PRs/builds due to ASF restrictions.

@Claudenw

Copy link
Copy Markdown
Contributor
  • could you drop the PR number here so I can cite it? Once it lands I'll flip XXE to a provided property.

#679 is the PR that does the XXE hardening.

Per Claudenw (PR apache#677): RAT lets operators define custom matcher classes that
see all scanned file text, but the matcher set is operator-defined config (not
attacker-supplied), so it's OUT-OF-MODEL: trusted-input — same posture as the
write mode.

Generated-by: Claude Opus 4.8 (1M context)
@potiuk

potiuk commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Thanks Claudenw — folded the custom-matcher surface into §3: RAT lets operators define custom matcher classes that see the full text of every scanned file, but since the matcher set is operator-defined config (not attacker-supplied), a custom matcher reading scanned text is OUT-OF-MODEL: trusted-input — the same posture as the write mode. Marked maintainer-confirmed.

ottlinger — thanks for the CI clarification; that matches what we expected (the Sonar/CodeQL job is credential-gated to specific branches under ASF restrictions, so its red on this fork PR is environmental, not anything this docs-only change introduces).

Still one open item: the XXE-hardening PR number (§14 Q3) — I've left §8 #2 tentative pending it. Whenever you drop the number I'll cite it and flip XXE from "hardening in flight" to a provided property. No rush.

The remaining §14 questions (Q1 untrusted-input posture, Q2 no-network, Q5 Whisker/Tentacles profile, Q6 sibling pointer files) are still open whenever convenient — one line each is plenty.

@ottlinger

Copy link
Copy Markdown
Contributor

@potiuk - thanks again:

  • Q5:as development on Tentacles/Whisker is rather low at the moment I'd personally prefer to start with RAT and add Tentacles/Whisker later in order to reduce noise.

  • Q2: AFAIK RAT does not open network connections as it runs locally; except for downloading stuff via Maven/Ant/buildtools in order to build the artifacts locally.

Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md Outdated
…d (external entities disabled, apache#679 hardens DOCTYPE), no-network confirmed (XSLT xsl:include caveat), correct archive path-handling (read to memory, no extract-to-disk → no path traversal), Whisker/Tentacles deferred
@potiuk

potiuk commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Thanks @ottlinger and @Claudenw — that's everything we needed. I've folded all your answers into THREAT_MODEL.md (pushed). Summary of what changed:

  • Trust posture (Q1) — confirmed and now (maintainer): RAT config (XSLT, config files, license defs, custom matchers) is operator-trusted; the scanned files are the untrusted surface. §2/§7.
  • No network (Q2) — confirmed (maintainer). Added your XSLT xsl:include nuance: the one operator-reachable way out is a remote xsl:include, and since stylesheets are trusted config that's OUT-OF-MODEL. §5/§8 Upgrade Apache Commons Collections to v3.2.2 #3.
  • XXE (Q3) — external entities are disabled, so §8 Gradle RAT: Add plainOutput, xmlOutput and htmlOutput options #2 is now a provided property (was tentative); noted PR RAT-560: changes to reduce XXE exposure #679 as the DOCTYPE-hardening follow-up, and the JAXP-system-properties configurability. §5/§5a/§8.
  • Archive bound (Q4) — kept as a disclaimed §9 gap (no bound, OOM not guarded).
  • Path handling — corrected a phantom risk: since RAT reads entries into memory and never extracts to disk, there's no zip-slip / path-traversal-on-write surface. An entry label like bar/baz.zip#/junk.txt is just a report string. §6/§9.
  • Whisker/Tentacles (Q5/Q6) — scoped this PR to creadur-rat per your preference; the sibling pointer files are a deferred follow-up.

With every §14 question answered, the model is ready to ratify whenever the PMC's happy with it. (The red check is the CodeQL "Build and analyze" job, which is unrelated to these doc-only changes — all 13 build/test matrix jobs pass.)

@ottlinger

Copy link
Copy Markdown
Contributor

@potiuk if we merge this PR we are done and trigger the next step or are there any other things that need to be done on our side? Thanks

@potiuk

potiuk commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

@ottlinger @Claudenw — quick note on the red check: that's only the SonarCloud job, which can't run on PRs because the ASF doesn't share its Sonar credentials with PR builds (as you noted). It's not an actual test failure, so it's safe to merge past it. And yes — merging this is the last step on your side; once it lands I'll verify the AGENTS.md → SECURITY.md → THREAT_MODEL.md chain resolves and Creadur/RAT enters the queue. (Starting with RAT and adding Tentacles/Whisker later is fine.)

@ottlinger ottlinger requested review from Claudenw and ottlinger June 26, 2026 21:23
@ottlinger

Copy link
Copy Markdown
Contributor

@Claudenw anything to add from your side? I'm fine with merging in order to go forward. Thx

@ottlinger ottlinger left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a first draft

@Claudenw Claudenw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to spend more time thinking about the threat model. I think with license headers, and the XXE questions resolved we can proceed.

Comment thread THREAT_MODEL.md Outdated
Comment thread AGENTS.md Outdated
Comment thread SECURITY.md
Comment thread SECURITY.md Outdated
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md Outdated
- THREAT_MODEL.md: use https for the license URL; clarify that --addLicense
  write mode inserts text from the operator-controlled license definition,
  not from the scanned tree.
- AGENTS.md, SECURITY.md: add the ASF license header.
- SECURITY.md: replace the TBD intro with a short security-posture summary
  and a "Reporting a Vulnerability" section.

Generated-by: Claude Opus 4.8 (1M context)
@potiuk

potiuk commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

Thanks for the careful review, @Claudenw and @ottlinger — really appreciate it. All six threads are addressed in a6978b8 and resolved:

  • https for the license URL
  • ✅ ASF license headers added to AGENTS.md and SECURITY.md
  • SECURITY.md TBD → short security-posture summary + a Reporting a Vulnerability section (points at security@apache.org; detailed model stays in THREAT_MODEL.md)
  • ✅ §3 write-mode note now spells out that --addLicense inserts text from the operator-controlled license definition, not the scanned tree (thanks Claude for the "controlled input file" framing)
  • ✅ the Maven/Ant/CLI "generated front-ends" point was already captured in the §2 note — pointed there

Should be good to go now — shout if anything else stands out. Thanks again! 🙏

@potiuk

potiuk commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

BTW. @Claudenw Usually (in order to save tokens) we either use no licence or we use short SPDX form of it - and usually exclude the AGENTS.md/SECURITY.md from release - to be 100% in line with expectation of full licences in released source packages.

We had a lengthy discussion about it - without complete conclusion, but the discussion seemed to converge on "If you are not releasing the .md files, it's fine to have no licence or SPDX or some other short version of it. This is generally OK - according to https://www.apache.org/legal/src-headers.html#is-a-short-form-of-the-source-header-available - where one such example short form is allowed in justified cases, while SPDX seems to be another good form. and we usually exclude those files in .rat excludes and ignore on source releases.

Long thread about it (with my assumptions summarized) can be found here - https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53

If you wish - we can use either approach.

Greeting from NY Claude :)

potiuk added 2 commits June 27, 2026 03:45
…e labels

Answers Claudenw's review note (does apache#679 impact the XXE data-flow line?):
the §5a/§8 text already records that RAT disables external entities + the
apache#679 DOCTYPE hardening, but the data-flow diagram and the input/residual
tables still labelled XXE a bare "surface". Annotate those three labels with
the mitigation so the diagram is consistent with §5a/§8 apache#2.

Generated-by: Claude Opus 4.8 (1M context)
Consistency with THREAT_MODEL.md (§5a / §8 apache#2): since RAT-560 (apache#679) RAT
builds XML parsers via the hardened StandardXmlFactory (DOCTYPE + external
entities disabled), so XXE is actively prevented. Lead with that; keep the
operator-trusted-config argument as defense-in-depth.

Generated-by: Claude Opus 4.8 (1M context)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants